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ABSTRACT 

Exact polyhedral model (PM) can be built in the general 
case if the only control structures are do-loops and struc¬ 
tured ifs, and if loop counter bounds, array subscripts and 
if-conditions are affine expressions of enclosing loop coun¬ 
ters and possibly some integer constants. In more general 
dynamic control programs, where arbitrary ifs and whiles 
are allowed, in the general case the usual dataflow analysis 
can be only fuzzy. This is not a problem when PM is used 
just for guiding the parallelizing transformations, but is in¬ 
sufficient for transforming source programs to other compu¬ 
tation models (CM) relying on the PM, such as our version 
of dataflow CM or the well-known KPN. 

The paper presents a novel way of building the exact poly¬ 
hedral model and an extension of the concept of the exact 
PM, which allowed us to add in a natural way all the pro¬ 
cessing related to the data dependent conditions. Currently, 
in our system, only arbirary ifs (not whiles) are allowed 
in input programs. The resulting polyhedral model can be 
easily put out as an equivalent program with the dataflow 
computation semantics. 
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1. INTRODUCTION 

The exact Polyhedral Model (PM), a.k.a. Exact Array 
Dataflow Analysis (EADA), can be build for a limited class 
of programs. Normally, it embraces affine loop nests with 
assignments in between, in which loop bounds and array 
element indexes are affine expressions of surrounding loop 
variables and fixed structure parameters (array sizes etc.). 
If-statements with affine conditions are also allowed. Meth¬ 
ods of EADA are well developed for this class of 

programs. The results are usually used for guiding paral¬ 
lelizing transformations. 
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We define (see Section the PM as a mapping that as¬ 
signs to each read (load) instance in the computation its 
unique write (store) instance that has written the value be¬ 
ing read. In other words, it is a collection of source functions, 
each bound to a single read operation in a program. Such 
function takes iteration vector of the read instance and pro¬ 
duces the name and the iteration vector of a write instance 
or symbol T indicating that such write do not exist and the 
original state of memory is read. 

However when the source program contains also one or 
several if-statements with non-afline (e.g., data dependent) 
conditions the known methods suggest only approximation 
which, generally, provides a set of possible writes for some 
reads. It is usually referred to as Fuzzy Array Dataflow 
Analysis (EADA) [^. In some specific cases such model 
may provide a source function that uses as its input also 
values of predicates associated with non-affine conditionals 
in order to produce the unique source. These cases seem 
to be those in which the number of such predicate values is 
finite (uniformly bounded). 

Usually, this does not makes a problem as parallelization 
can proceed relying on the approximate PM. But our aim 
is to convert the source program (part) completely into the 
dataflow computation model, and any approximation is un¬ 
acceptable for us. So, our task was to extend the class of 
programs for which exact PM can be built by programs with 
non-affine conditionals. Our model representation language 
is extended with predicate symbols corresponding to non- 
affine Boolean expressions in the source code. From such a 
model the exact source function for each read can be easily 
extracted. These functions are, generally, recursive and they 
depend on usual affine parameters as well as on an unlimited 
number of predicate values. 

But the source function is not our aim. For building 
dataflow program we need the inversed, use set function. 
From the parallelization perspective this program carries im¬ 
plicitly the maximum amount of parallelism of the source 
program. A simple computation strategy (see Section [7.3| l 
exhibits all this parallelism. More details and references can 
be found in . 

In this paper we describe briefly our original way of build¬ 
ing the dataflow model for affine programs and then expand 
it to programs with non-affine conditionals. The affine class 
and the affine solution tree are defined in Section[2l Sections 
[3]-[^ describe our algorithm of building the PM. Several ex¬ 
amples are presented in Section Section describes two 
different semantics of the PM considered as a program. Sec¬ 
tion [^compares our approach and results with related ones. 


2. SOME FORMALISM 

Consider a Fortran program fragment P. We are inter¬ 
ested in memory reads and writes which have the form of 
array element or scalar variable. The latter will be treated 
below as 0 -dimension arrays. 

We define a computation graph by running the program 
P with some input data. The graph consists of two kinds 
of nodes: reads and writes, corresponding respectively to 
individual executions of load or store memory operation. 
There is a link from a write w to a read r if r reads the 
value written by w. Thus, r uses the same memory cell as 
w and w is the last write to this cell before r. 

Our purpose is to obtain a compact parametric description 
of all computation graphs for a given program P. To make 
such description feasible we need to consider a limited class 
of programs. It is a well known affine class, which can be 
formally defined by the set of rules shown in Fig[^ 

A (empty statement) 

A(ii, ..., ik) = e (assignment, k > 0) 

Xi', X 2 (sequence) 

if c then Xi ; else X 2 ; endif (conditional) 

do V = 61 , 62 ; X; enddo (do-loop) 

Figure 1: Affine program constructors 

The right hand side e of an assignment may contain array 
element access A(ii,..., ik), k > 0. All index expressions as 
well as bounds 61 and 62 of do-loops must be affine in sur¬ 
rounding loop variables and structure parameters. Affine 
expressions are those built from variables and integer con¬ 
stants with addition, subtraction and multiplication by lit¬ 
eral integer. Also, in an affine expression, we allow whole 
division by literal integer. Condition c also must be affine, 
i.e., equivalent to 6 = 0 or 6 > 0 where 6 is affine. 

Programs (or program parts) following these limitations 
have been called static control programs (SCoP) [^[^. Their 
computation graph depends only on structure parameters 
and does not depend on dynamic data values. 

Below, we are to remove the restriction that conditional 
expression c must be affine. Such extended program class 
has been called weakly dynamic programs (WDP) Here 
we shall allow only arbitrary ifs but not whiles which will 
be considered in the future. 

A point in the computation trace of an affine program 
may be identified as (s,7s), where s is a point in the pro¬ 
gram and Is is the iteration vector, i.e., a vector of integer 
values of all enclosing loop variables of point s. The fist of 
these variables will be denoted as Is, which allows to depict 
the point s itself as (s,/s). (Here and below boldface sym¬ 
bols denote variables or list of variables as syntactic objects, 
while normal italic symbols denote some values as usual). 

Thus, denoting an arbitrary read or write instance as 
(r,Ir) or {w,Im) respectively, we represent the whole com¬ 
putation graph as a mapping: 

Fp : {r,Ir)r^ ( 1 ) 

which, for any read node (r, P), yields the write node (w, P) 
that has written the value being read, or yields T if no such 
write exist and thus the original contents of the cell is read. 
This form of graph is called a source graph, or S-graph. 

However, for translation to dataflow computation model 


we need the reversed map, that, for each write node, finds 
all read nodes which read the very value written. So, we 
need the multi-valued mapping 

Gp-.{w,P)r^{{r,Ir)} ( 2 ) 

which for each write node (w,P) yields a set of all read 
nodes {(r, 7^)} that read that very value written. We call 
this form of computation graph a use graph, or U-graph. 

A subgraph of S-graph (U-graph) associated with a given 
read r (write w) will be called r-component (w-component). 

For each program statement (or point) s we define the 
domain Dom(s) as a set of values of iteration vector P, 
such that (s,7s) occurs in the computation. The follow¬ 
ing proposition summarizes the well-established property of 
static control programs [H m HI |10| |11| (which is also justi¬ 
fied by our algorithm). 

Proposition I. For any statement {s,I^ in a static con¬ 
trol program P its domain Dom{s) can he represented as 
finite disjoint union such that each subdomain Di 

can be specified as a conjunction of affine conditions of vari¬ 
ables Is and structure parameters, and, when the statement 
is a read {r,Ir), there exist such Di that the mapping Fp 
on each subdomain Di can be represented as either T or 
{w, ( 61 ,..., 6 m)) for some write w, where each d is an affine 
expression of variables P and structure parameters. 

This result suggests the idea to represent each r-component 
of Fp as a solution tree with affine conditions at branching 
vertices and terms of the form fU{ei,..., Cm} or T at leaves. 
A similar concept of quasi-affine solution tree, quasi, was 
suggested by P. Feautrier 

A single-valued solution tree (S-tree) is a structure used to 
represent r-components of a S-graph. Its syntax is shown in 
Fig[^ It uses just linear expressions (L-expr) in conditions 
and term arguments, so a special vertex type was introduced 
in order to implement integer division. 


S-tree 


= ± 


1 

term 


1 

{cond^ S-treet : S-treef) 

(branching) 

1 

(L-expr =: num var-\- var ^ S-tree) 

(integer division) 

term 


:= name{L-expri ,..., L-expr 

(k > 0) 

var 


:= name 


num 


:=...|_2|-1|0|1|2|3|... 


cond 


:= L-cond \ predicate 

(any condition) 

L-cond 

;= L-expr = 0 | L-expr > 0 

(afhne condition) 

L-expr 

;= num \ num var+ L-expr 

(affine expression) 

atom 


:= _L 1 name{numi ^..., niim/c} 

(ground term, k>0) 


Figure 2: Syntax for single-valued solution tree 

Given concrete integer values of all free variables of the S- 
tree one can evaluate the tree to an atom. The two following 
evaluation rules must be applied iteratively. 

A branching like (c —>■ Ti : T 2 ) evaluates to Ti if condi¬ 
tional expression c evaluates to true, otherwise to T 2 . Non- 
affine conditions are expressed by a predicate. This will be 
explained below in Section [3.3| 

A division (e =: mg-|-r —>■ T) introduces two new variables 
{q, r) that take respectively the quotient and the remainder 
of integer division of integer value of e by positive constant 
integer m. The tree evaluates as T with parameter list ex¬ 
tended with values of these two new variables. 




It follows from Proposition that for an afhne program 
P the r-component of the S-graph Fp for each read (r, Ir) 
can be represented in the form of S-tree T depending on 
variables Ir and structure parameters. 

However the concept of S-tree is not sufficient for repre¬ 
senting lu-components of U-graph, because those must be 
multi-valued functions in general. So, we extend the dehni- 
tion of S-tree to the definition of multi-valued tree, M-tree, 
by two auxiliary rules shown on Fig|^ 

M-tree ::= ... the same as for S-tree ... 

I (iiM-tree\ ... M-tree„) (finite union, n > 2) 

I (a var —> M-tree) (infinite union) 

Figure 3: Syntax for multi-valued tree 

The semantics also changes. The result of evaluating M- 
tree is a set of atoms. Symbol T now represents the empty 
set, and the term A'’{... } represents a singleton. 

To evaluate (&Ti,..., T„) one must evaluate sub-trees Ti 
and take the union of all results. The result of evaluating 
(@v —>■ T) is mathematically defined as the union of infinite 
number of results of evaluating T with each integer value v 
of variable v. In practice the result of evaluating T is non¬ 
empty only within some bound interval of values v. In both 
cases the united subsets are supposed to be disjoint. 

Below we present our algorithm of building a S-graph (Sec¬ 
tions]^ and and then a U-graph (Section]^. 

3. BUILDING STATEMENT EFFECT 
3.1 Statement Effect and its Evaluation 

Consider a program statement X, which is a part of an 
affine program P, and some fc-dimensional array A. Let 
(wA, La) denote an arbitrary write operation on an element 
of array A within a certain execution of statement X, or 
the totality of all such operations. Suppose that the body 
of X depends affine-wise on free parameters pi,... ,pi (in 
particular, they may include variables of loops surrounding 
X in P). We define the effect of X over array A as a function 

Ea[X] : {pi,...,pi-qi,...,qk) (wA,/„ a) +-L 

that, for each tuple of parameters pi,...,pi and indexes 
qi,..., qk of an element of array A, yields an atom {wA, La) 
or T. The atom indicates that the write operation {wA, La) 
is the last among those that write to element A(gi ,... ,qk) 
during execution of X with affine parameters pi,... ,pi and 
T means that there are no such operations. 

The following claim is another form of Proposition]^ the 
effect can be represented as an S-tree with program statement 
labels as term names. We call them simply effect trees. (All 
assignments are supposed labeled during preprocessing). 

Building effect is the core of our approach. Using S-trees 
as data objects we implemented some operations on them 
that are used in the algorithm presented on Fig]^ A good 
mathematical foundation of similar operations for similar 
trees has been presented in ]^. 

The algorithm goes upwards along the AST from primi¬ 
tives like empty and assignment statements. Operation Seq 
computes the effect of a statement sequence from the effects 
of component statements. Operation Fold builds the effect of 
a do-loop given the effect of the loop body. For if-statement 


with affine condition the effect is built just by putting the 
effects of branches into a new conditional node. 

Ea[A] = T (empty statement) 

Ea[Ai; Aa] = Xeq(EA[Ai], Ea[A 2]) (sequence) 

Ea[LA : A(ei,..., efc) = e] = (assignments to A) 

(9i = ei —t . ■. {qk = Ck —>■ LA{7} ; T) ■ ■ ■ : T) 
where / is a list of all outer loop variables 
Ea[LS : B(...) = e] = T (other assignments) 

EA[if c then Xi else X 2 endif] = (conditional) 

= (c^ Ea[Ai] : Ea[A 2 ]) 

EA[do v = 61,62; X; enddo] = (do-loop) 

= Fold(j;, 6 l,e 2 , Ea[A]) 

Figure 4 : The rules for computing effect tree over 
fc-dimensional array A 

The implementation of function Seq is straight. To com¬ 
pute Seq(ri,r2) we simply replace all T-s in Ta with a copy 
of Ti. Then the result is simplified by a function Prune which 
prunes unreachable branches by checking the consistency of 
affine condition sets (the check is known as Omega-test [ 11 |). 

The operation Fold(i;, ei, ea, T), where u is a variable and 
ei and ea are affine expressions, produces S-tree T' that does 
not contain v and represents the following function. Given 
values of all other parameters, T' evaluates to 

max{u £ [61,62] I T{v) evaluates to a term } ( 3 ) 

Making this T' usually involves solving some 1 -D parametric 
integer programming problems and combining the results. 

3.2 Graph Node Structure 

In parallel with building the effect of each statement we 
also compose a graph skeleton, which is a set of nodes with 
placeholders for future links. For each assignment a separate 
node is created. At this stage the graph nodes are associated 
with AST nodes, or statements, in which they were created, 
for the purpose that will be explained in Section ]^ The 
syntax of a graph node description is presented in Fig]^ 

node ::= (node (name context) 

(dom conditions ) 

(ports ports) 

(body computations) 

) 

context ::= names 

port ::= (name type source) 

computation ::= (eval name type expression destination) 
source ::= S-tree \ IN 
destination ::= M-tree \ OUT 

Figure 5 : Syntax for graph node description 

Non-terminals ending with -s usually denote a repetition 
of its base word non-terminal, e.g., ports signifies list of ports. 
A node consists of a header with name and context, domain 
description, list of ports that describe inputs and a body 
that describes output result. The context here is just a list 
of loop variables surrounding the current AST node. The 
domain specifies a condition on these variables for which the 
graph node instance exists. Besides context variables it may 
depend on structure parameters. Ports and body describe 
inputs and outputs. The source in a port initially is usually 
an atom A{6i,..., 6*,} (or, generally, an S-tree) depicting 


an array access A(ei,..., Ck), which must be eventually re¬ 
solved into a S-tree referencing other graph nodes as the 
true sources of the value (see Section [4.2[ ). A computation 
consists of a local name and type of an output value, an 
expression to be evaluated, and a destination placeholder _L 
which must be replaced eventually by a M-tree that specifies 
output links (see Section]^. The tag IN or OUT declares 
the node as input or output respectively. 

Consider the statement S=S+X(i) from program in Fig.8a. 
The initial view of its graph node is shown in Fig|^ The 
expression in eval clause is built from the rhs by replacing 
all occurrences of scalar or array element with their local 
names (that became port names as well). A graph node for 
assignment has a single eval clause and acts as a generator 
of values written by the assignment. Thus, a term of an 
effect tree may be considered as a reference to a graph node. 

(node (SI i) 

(dom (i > l)(i < n)) 

(ports (si double S{}) (xl double X{i})) 
(body (eval S double (si -I- xl) T) ) 

) 


Figure 6: An initial view of graph node for state¬ 
ment S=S+X(i) 

3.3 Processing Non-affine Conditionals 

When the source program contains a non-affine condi¬ 
tional statement X, special processing is needed. We add a 
new kind of condition, a predicate function call, or simply 
predicate, depicted as 

{L-exprs} (4) 

that may be used everywhere in the graph where a normal 
affine condition can. It contains a name, sign T or F (affir¬ 
mation or negation) and a list of affine arguments. 

However, not all operations can deal with such conditions 
in argument trees. In particular, the Fold cannot. Thus, 
we eliminate all predicates immediately after they appear in 
the effect tree of a non-affine conditional statement. 

First, we drag the predicate p, which is initially on the 
top of the effect tree Ea[A] = (p — >■ Ti : T 2 ), downward to 
leaves. The rather straightforward process is accomplished 
with pruning. In the result tree, Tx , all copies of predicate p 
occur only in downmost positions of the form (p —>■ Ai '■ A 2 ), 
where each Ai is either term or T. We call such conditional 
sub-trees atomic. In the worst case the result tree will have 
a number of atomic sub-trees being a multiplied number of 
atoms in sub-trees Ti and T 2 . 

Second, each atomic sub-tree can now be regarded as an 
indivisible composite value source. When one of Ai is T, this 
symbol depicts an implicit rewrite of an old value into the 
target array cell A(gi,... ,qk) rather than just “no write”. 
With this idea in mind we now replace each atomic sub¬ 
tree U with a new term f7new{*i,..., *n} where argument 
list is just a list of variables occurring in the sub-tree U. 
Simultaneously, we add the definition of f7new in the form of 
a graph node (associated with the conditional statement X 
as a whole) which is shown in FiglZl This kind of nodes will 
be referred to as blenders as they blend two input sources 


(node (Unew ii ■ ..in) 

(dom Dom (A) -F path-to-U-in-Tx) 

(ports {at {p ^ RW(Ai) : RW(A 2 ))) 

(body (eval at a 1.) ) 

) 

Figure 7: Initial contents of the blender node for 
atomic subtree U in Ea[A] = Tx 

into a single one. The domain of the new node is that of 
statement X restricted by conditions on the path to the 
sub-tree U in the whole effect tree Tx ■ The result is defined 
as just copying the input value a (of type t). The most 
intriguing is the source tree of the sole port a. It is obtained 
from the atomic sub-tree t/ = (p —>■ Ai : A 2 ). Each Ai is 
replaced (by operator RW) as follows. When Ai is a term it 
remains unchanged. Otherwise, when Ai is T, it is replaced 
with explicit reference to the array element being rewritten, 
A{gi,..., qkj. However, an issue arises: variables qi,... ,qk 
are undefined in this context. The only variables allowed 
here are ii,... ,i„ (and fixed structure parameters). Thus 
we need to express indexes qi,... ,qk through “known” values 

To resolve this issue consider the list of (affine) conditions 
L on the path to the subtree U in the whole effect tree 
Tx as a set of equations connecting variables qi,... ,qk and 

Proposition 2. Conditions L specify a unique solution 
for values qi,... ,qk depending on ii,... ,in. 

Proof. Consider the other branch Aj of subtree U, which 
must be a term. We prove a stronger statement, namely, 
that given exact values of all free variables occurring in Aj, 
Vars(Aj), all q-s are uniquely defined. The term Aj de¬ 
notes the source for array element A(q'i,..., qt) within some 
branch of the conditional statement X. Note, however, that 
this concrete source is a write on a single array element only. 
Hence, array element indexes qi,. .. ,qk are defined uniquely 
by Vars(Aj). Now recall that all these variables are present 
in the list ii,... ,in (by definition of this list). □ 

Now that the unique solution does exist, it can be easily 
found by our affine machinery. See Section in which the 
machinery used for graph inversion is described. 

Thus, we obtain, for conditional statement X, the effect 
tree that does not contain predicate conditions. All pred¬ 
icates got hidden within new graph nodes. Hence we can 
continue the process of building effects using the same oper¬ 
ations on trees as we did in the purely affine case. Also, for 
each predicate condition a node must be created that evalu¬ 
ates the predicate value. We shall return back to processing 
non-affine conditionals in Section (5.21 

4. EVALUATION AND USAGE OF STATES 
4.1 Computing States 

A state before statement (s,/s) in affine program frag¬ 
ment P with respect to array element A{qi,...,qk) is a 
function that takes as arguments the iteration vector T = 
(ii,..., in), array indexes (gi,. .., qk) and values of structure 
parameters and yields the write (w, Iw) in the computation 


of P that is the last among those that write to array element 
A(gi, ... ,qk) before (s, h). 

In other words this function presents an effect (over array 
A) of executing the program from the beginning up to the 
point just before (s, Is). It can be represented as an S-tree, 
5 Ia[s], called a state tree at program point before statement s 
for array A. It can be computed with the following method. 

For the starting point of program P we set 

^a[F] = ^ Aini{(?i, (5) 

where term Aini{?i ,... ,qk} signifies an untouched value of 
array element A( 5 i,... ,qk) and h, Ui are lower and upper 
bounds of the i-th array dimensions (which must be affine 
functions of fixed parameters). Thus, Q means that all A’s 
elements are untouched before the whole program P. 

The further computation of Ha is described by the follow¬ 
ing production rules: 

1. Let B 2 \ = T. Then Ha[I3i] = T. The state 

before any prefix of B is the same as that before B. 

2. Let Ha[Bi;B 2 ] = T. Then HaIBs] = Seq(r, Ea[Bi1). 
The state after the statement Bi is that before Bi com¬ 
bined by Seq with the effect of Bi. 

3. Let Ha [if c then Bi else B 2 endif] = T. Then 
Ha[I3i] = Ha[ 132 ] = T. The state before any branch 
of a-statement is the same as before the whole if- 
statement. 

4. Let Ha [do v = 61 , 62 ; B-, enddoj = T. ThenHA[B] = 
Seq(r, Fold(t!, 61 , u—1, Ea[B[)). The state before the 
loop body B with the current value of loop variable v is 
that before the loop combined by Seq with the effect of 
all preceding iterations of B. 

The form in rule 4 worth some comments. Here the upper 
limit in the Fold clause depends on v. To be formally correct, 
we must replace all other occurrences of v in the clause with 
a fresh variable, say v'. Thus, the resulting tree will (gener¬ 
ally) contain v, as it expresses the effect of all iterations of 
the loop before the u-th iteration. 

Using the rules 1-4 one can compute the state in any in¬ 
ternal point of the program P. The following proposition 
limits the usage, within a state tree T, of terms whose asso¬ 
ciated statement is enclosed in a conditional with non-affine 
condition. It will be used further in Section (5.21 

Proposition 3. Let a conditional statement X with non- 
affine condition be at a loop depth m within a dynamic con¬ 
trol program P. Consider a state tree Tp = Ha[p) in a point 
p within P over an array A. Let A{i\,... ,ik} be a term in 
Tp, whose associated statement, also A, is inside a branch 
of X. Then the following claims are all true: 

• m < k, 

• p is inside the same branch of X and 

• indexes ii,... ,im are just variables of loops enclosing 

X. 

Proof. Let A be a term name, whose associated state¬ 
ment A is inside a branch 6 of a conditional statement X with 
non-affine condition. It is either assignment to an array, say 
A, or a blender node emerged from some inner conditional 


(performing a ’’conditional assignment” to A). From our way 
of hiding predicate conditions described in Section [3.3| it fol¬ 
lows that the effect tree of X, EaIA), as well as of any other 
statement containing X, will not contain a term with name 
A. Hence, due to our way of building states from effects 
described above, this is also true for the state tree of any 
point outside X, including the state Tx before the X itself. 
Now, consider the state Tp of a point p within a branch 6 i 
of X. (Below we’ll see that 6 i =h). We have 

Tp = Seq{Tx,Tx-p), ( 6 ) 

where Tx-p is the effect of executing the code from the 
beginning of the branch hi to p (recall that the state before 
the branch bi , Tb ^, is the same as Tx according to Rule 3 
above). Consider a term A{ii,... ,ik} in Tp. As it is not 
from Tx, it must be in Tx-p. Obviously, Tx-p contains only 
terms associated with statements of the same branch with 
p. Thus, bi = b. And these terms are only such that their 
initial m indexes are just variables of m loops surrounding 
X. Thus, given that the operation Seq does not change term 
indexes, we have the conclusion of Proposition]^ □ 

4.2 Resolving Array Accesses 

Using states before each statement we can build the source 
graph Fp. Consider a graph node X. So far, a source in its 
ports clause contains terms representing array access, say 
A{ei,..., 6 *;}. Recall that the node X is associated with 
a point p in the AST. We take the state Ha[p] and use it 
to find the source for our access indexes (ei,... , 6 ^) (just 
doing substitution followed by pruning). The resulting S- 
tree replaces original access term. Doing so with each array 
access term we obtain the source graph Fp. 

Recall that each graph node X has a domain Dom(A) 
which is a set of possible context vectors. It is specified by a 
list of conditions, which are collected from surrounding loop 
bounds and if conditions. We write D^p to indicate that 
the condition p is valid in D (or follows from D). In case 
of a predicate condition p — ^**{ 61 ,..., Ck} it implies that 
the list D just contains p (up to equality of d). For a S- 
graph built so far, the following proposition limits the usage 
of atoms A{... } whose Dom(A) has a predicate condition. 

Proposition 4. Suppose that B is a regular node (not a 
blender) whose source tree T contains a term A{ii,... ,ik} 
(refering to an assignment to an array A). Let Dom(A)=>p, 
where p = ... ,jm} is a predicate condition. Then: 

• m < k, 

• ji = ii,..., jm = im, and all these are just variables 
of loops enclosing the conditional with predicate p, 

• Dom(i3)^p. 

Proof. As Dom(A)=>p*’{ji,... ,jm}, the predicate^ de¬ 
notes the condition of a conditional statement X enclosed 
by m loops with variables ji,..., jm, and this X contains 
the statement A in the branch b (by construction of Dom). 
The source tree T was obtained by a substitution into the 
state tree before B, Tb = Ha[B], which must contain a 
term A{i'i,... ,ik}. It follows, by Proposition]^ that state¬ 
ment B is inside the same branch b (hence, Dom(i3) =>p), 
m < k and i'l,... ,im are just variables ji,..., jm. However 
the substitution replaces only formal array indexes and does 
not touches enclosing loop variables, here ji,..., jm. Hence 
i'l — i'l, ..., im — im. C 


When B is a blender the assertion of the Proposition]^ is 
also valid but Dom(_B) should be extended with conditions 
on the path from the root of the source tree to the term 
A{. . . The details are left to the reader. 

5. BUILDING THE DATAFLOW MODEL 
5.1 Inverting S-graph: Affine Case 

In our dataflow computation model a node producing a 
data element must know exactly which other nodes (and by 
which port) need this data element, and send it to all such 
ports. This information should be placed, in the form of 
destination M-tree, into the eval clause of each graph node 
instead of initial placeholder T. These M-trees are obtained 
by inversion of S-graph. 

First, we split each S-tree into paths. Each path starts 
with header term R{ii,... ,in}, containing the list of inde¬ 
pendent variables, ends with term W{ei,... ,em} and has 
a list of affine conditions interleaved with division clauses 
like (e =: kq + r) {k is a literal integer here). In all ex¬ 
pressions, variables may be either from header or defined 
by division earlier. The InversePath operation produces the 
inverted path that starts with header term W{ji,... ,jm.} 
with new independent variables ji,... ,jm,, ends with term 
) /n} and has a list of affine conditions and divisions 
in between. The inversion is performed by variable elimina¬ 
tion. When a variable cannot be eliminated it is simply 
introduced with clause (On). 

All produced paths are grouped by new headers, each 
group being an M-tree for respective graph node, in the form 
(& Ti T 2 ■ ■ ■) where each Ti is a 1-path tree. Further, the 
M-tree is simplified by the operation SimplifyTree. This op¬ 
eration also involves finding bounds for ©-variables, which 
are then included into ©-vertices in the form: 


{<Sv{hui){l 2 U 2 ) ...T) 

where li,Ui are affine lower and upper bounds of i-th inter¬ 
val, and V must belong to one of the intervals. 

5.2 Inverting S-graph for Programs with Non- 
affine Conditionals 

When program P has non-affine conditionals the above in¬ 
version process will probably yield some M-trees with pred¬ 
icate conditions. The node with such M-tree need an addi¬ 
tional port for the value of predicate. We call such nodes 
filters. The simplest filter has just two ports, one for the 
main value and one for the value of the predicate, and sends 
the main value to the destination when the predicate value is 
true (or false) and does nothing otherwise. Splitting nodes 
with complex M-trees we can always reduce our graph to 
that with only simplest filters. 

Generally, the domain of each arrow and each node may 
have several functional predicates in the condition list. Nor¬ 
mally, an arrow has the same list of predicates as its source 
and target nodes. However, sometimes these lists may dif¬ 
fer by one item. Namely, a filter node emits arrows with 
a longer predicate list whereas the blender node makes the 
predicate list one item shorter compared to that of incoming 
arrow. In the examples below both green (dotted) and red 
(dashed) arrows have additional predicate in their domain. 

However, our aim is to produce not only U-graph, but 
both S-graph and U-graph which must be both complete 


and mutually inverse. Thus, we prefer to update the S- 
graph before inversion such that inversion would not produce 
predicates in M-trees. To this end, we check for each port 
whether its source node has enough predicates in its domain 
condition list. When we see that the source node has less 
predicates, then we insert a filter node before that port. 
And the opposite case, that the source has more predicates, 
is impossible, as it follows immediately from Proposition]^ 

6. EXAMPLES 

A set of simple examples of a source program (subroutine) 
with the two resulting graphs - S-graph and U-graph - are 
shown in Figs. 8,9,11. All graphs were generated as text 
and then redrawn graphically by hand. Nodes are boxes 
or other shapes and data dependences are arrows between 
them. Usually a node has several input and one output 
ports. The domain is usually shown once for a group of 
nodes (in the upper side in curly braces). The groups are 
separated by vertical line. Each node should be considered 
as a collection of instance nodes of the same type that differ 
in domain (context) parameters from each other. Arrows be¬ 
tween nodes may fork depending on some condition (usually 
it is affine condition of domain parameters), which is then 
written near the start of the arrow immediately after the 
fork. When arrow enters a node it carries a new context (if 
it has changed) written there in curly braces. The simplest 
and purely affine example in Fig]^ explains the notations. 
Arrows in the S-graph are directed from a node port to its 
source. The S-graph arrows can be interpreted as the flow 
of requests for input values. (See Section [7.2] for details). 



(a) (b) (c) 


Figure 8: Fortran program Sum (a), its S-graph (b) 
and U-graph (c) 

In the U-graph arrows go from node output to node input. 
In contrast with the S-graph, they denote actual flow of data. 
The U-graph semantics is described in Section [7.3| 

In the U-graph we need to get rid of zero-port nodes which 
arise from assignments with constant rhs. We insert into 
them a dummy port that receives a dummy value. Thus a 
node Start sending a token to node SI appeared in Fig]^. 

A simplest example with non-afBne conditions is shown 
on Fig]^ Here appears a new kind of node, the blender, de¬ 
picted as a blue truncated triangle (see Fig. Formally, 
it has a single port, which receives data from two different 
sources depending on the value of the predicate. Thus, it has 
an implicit port for Boolean value (on top). The main port 
arrows go out from sides; true and false arrows are dotted 
green and dashed red respectively. 

In the U-graph the blender does not use a condition: in 
either case it gets a value on its single port without knowing 















P(i) = R(i) < X(i) 

B(i) = if P(i) then X(i) else R(i) 

R(i) = if i = 1 then Rl() else if i > 1 then B(i — 1) else _L 
Rl() = 0 

Rout = if n = 0 then Rl() else if N > 0 then B(n) else _L 


Figure 11: System of recurrence equations equiva¬ 
lent to S-graph on Figj^ 


Figure 9: Fortran program Max (a), its S-graph (b) 
and U-graph (c) 


subroutine Bubble(A,n) 
real(8) A(0:n),Z 
do i=n.1,-1 
do j=1,i 

if A(j-1)<A(j) then 
Z=A0) 

A(i)=A(i-1) 

A(j-1)=Z 

endif 

enddo 

enddo 

end 


(a) 




Figure 10: Fortran program Bubble (a), its S-graph 
(b) and U-graph (c) 


which node has sent it and under which condition. However, 
as the source itself is not under the needed condition, a Hlter 
node must be inserted in between the source node and the 
receiver port (it is shown in Fig[^ as an inverted orange 
trapezoid). A circle at the entry point means that the Hlter 
is open when the condition is false. 

A more interesting example, a bubble sort program and its 
graphs, is shown in Fig |10| In contrast with previous ones, 
this U-graph exhibits high parallelism: the parallel time is 
2 n instead of n(n -|- l)/2 for sequential execution. 

7. USAGE OF POLYHEDRAL MODEL 

7.1 General Form of Dataflow Graph 

The general syntax of PM format is shown in Fig[^ The 
S-graph is comprised of port source S-trees (single-valued), 
whereas the U-graph of destination M-trees (multi-valued). 
Both graphs must be mutually inverse, i.e., they represent 
the same dependence relation. 

Some nodes produce Boolean values, which can be used as 
predicates. In S-graph, they are alowed in a blender, which 
is an identity node with unique port source tree of the form 
(p*’{ei,..., Ck} —>■ Ti : T 2 ). In U-graph, we forbid predicates 
in destination trees, but we allow filter nodes, which are in 
some sense inverse to blenders. Instead of destination tree of 
the form (p*'{ei,..., Ck} —t Tout ■ T) they have an additional 
boolean port p with source P{ei,..., e*} and the destination 
tree with just p as condition. Thus, filter is a gate which is 
open or closed depending on the value at port p. Note that 
filters are needed in U-graph, but not in S-graph. 

The S-graph must satisfy the two following constraints. 
The first is a consistency restriction. Consider a node X{7} 
with domain Dx and a source tree T. Let I € Ux- Then 
T(I) is some atom Y{J} such that J £ Dy. The second 


constraint requires that the S-graph must be well-founded, 
which means that no one object node X{7} may transitively 
depend on itself. 

7.2 Using the S-graph as a Program 

The S-graph can be used to evaluate output values given 
all input values (and structure parameters). For simplicity, 
we assume that each node produces a single output value. 

Following we transform the S-graph into a system of 
recurrence equations (SRE), which can be treated as a re¬ 
cursive functional program. In Fig |ll| is presented a SRE for 
the S-graph from Fig[^. Execution starts with invocation 
of the output node function. Evaluation step is to evalu¬ 
ate the right hand side calling other invocations recursively. 
For efficiency it is worth doing tabulation so that neither 
function call is executed twice for the same argument list. 

Note, that both the consistency and the well-foundedness 
conditions together provide the termination of the S-graph. 

7.3 Computing the U-graph in the Dataflow 
Computation Model 

The U-graph can be executed as program in the dataflow 
computation model. A node instance with concrete context 
values fires when all its ports get data element in the form 
of data token. Each fired instance is executed by computing 
all its eval clauses sequentially. All port and context values 
are used as data parameters in the execution. In each eval 
clause the expression is evaluated, the obtained value is as¬ 
signed to a local variable and then sent out according to the 
destination M-tree. The tree is executed in an obvious way. 
In the conditional vertex, the left or right subtree is exe¬ 
cuted depending on the Boolean value of the condition. In 
&-vertices, all sub-trees are executed one after another. An 
@-vertex acts as a do-loop with specified bounds. Each term 
of the form R.x{fi,..., /„} acts as a token send statement, 
that sends the computed value to the graph node R to port 
X with the context built of values of ft. The process stops 
when all output nodes get the token or when all activity 
stops (quiescence condition). To initiate the process, tokens 
to all necessary input nodes should be sent from outside. 

7.4 Extracting Source Function from S-graph 

There are two ways to extract the source function from the 
S-graph. First, we may use the S-graph itself as a program 
that computes the source for a given read when the iteration 
vector of the read as well as values of all predicates are 
available. We take the SRE and start evaluating the term 
R{ii,..., in), where ii,... ,i„ are known integers. We stop 
as soon as a term of the form W{j\,... ,jm) is encountered 
on the top level (not inside predicate evaluation), where W 
is a node name corresponding to a true write operation (not 
a blender) and ji,..., jm are some integers. 

Also, there is a possibility to extract the general definition 
























of the source function for a given read in a program. We 
start from the term R{i\,... ,in} where ... ,in are sym¬ 
bolic variables and proceed unfolding the S-graph symboli¬ 
cally into just the S-tree. Having encountered the predicate 
node we insert the branching with symbolic predicate con¬ 
dition. Having encountered a term lV{ei ,..., Cm} for regu¬ 
lar assignment statement W we stop unfolding the branch. 
Having encountered a term for a blender node we unfold it 
further - this way we avoid taking our artificial dummy as¬ 
signments as a source. Proceeding this way we will generate 
a possibly infinite S-tree (with predicate vertices) represent¬ 
ing the source function in question. If we’re lucky the S-tree 
will be finite. It seems like in the exact result (in the 

same sense) is produced only when the above process yields 
a finite S-tree. 

But we get a good result even when the generated S-tree is 
infinite (this is the case in examples Max and Bubble). Using 
a technique like supercompilation it is possible to fold 
the infinite S-tree into a finite cyclic graph. 

8. RELATED WORK 

The foundations of dataflow analysis for arrays have been 
well established in the 90-s by Feautrier [^, Pugh(^, 
Maslov[^ and others. Their methods use the Omega and 
PIP libraries and yield an exact dependence relation for any 
pair of read and write references in affine program. Thus, 
our work adds almost nothing for the affine case (besides 
producing a program in the dataflow computation model). 
However, for non-affine conditions, the state-of-the-art is 
generally a fuzzy solution in which the source function 
produces a set of possible sources. The authors claim that 
nothing more can be done. But all depends on the form we 
want to see the result in. Sometimes one may be satisfied 
with the source function expressed in the form of a finite 
quast extended with predicate vertices. Then why not allow 
a bit more general form - a S-graph with predicate nodes, 
or SRE? The main thing is that it was good for something. 

Known translations from WDP to KPN [o usually rely 
on FADA and seem to succeed only when FADA succeeds 
to be exact (judging by the examples used). It is interesting 
what and how they do with the Bubble Sort in Fig |10[ 

Our base affine machinery of building the exact PM also 
differs. While it is common to consider each read-write pair 
separately and then combine the results, our method first 
produces effects and states using only writes, and then re¬ 
solves each read against the respective state. It is interest¬ 
ing to compare our effect/state building process with that 
of backward traversing the control flow graph [^. Both pro¬ 
cesses move along the same path but in opposite directions. 
Authors usually argue for moving backward noting that the 
process can stop when the total source is found (cf. also 
[10|). We hope to obtain the same effect just implementiug 
our algorithm in a lazy language. Then the tree T will not 
be built at all in calls like Seq(r, t), where t is a term. 

In principle, the way we deal with non-affine conditionals 
can be reformulated as follows: (1) push all dynamic ifs to 
the innermost level; (2) add else parts which just assign the 
existing value to the same variable; (3) do FADA, identify¬ 
ing different copies of each predicate value; (4) collect the 
resulting exact source functions as S-graph, or SRE; (5) use 
this SRE as recursive dehnition of true source (passing by 
all dummy assignmeuts introduced at item 2). I am grateful 
to the IMPACT reviewers for pointing out this relationship. 


9. CONCLUSION 

Our aim was to convert a program P of a specific class into 
the dataflow computation model. Thus we need not only to 
build the exact and complete polyhedral model (which is a 
set of exact source functions for all reads in P), but also to 
invert it and thus obtain the exact use set function for each 
write in P. The latter form can be used as a program in 
a dataflow computation model. A prototype translator is 
implemented in Refal-6 [^. It admits an arbitrary WDP. 

The work was supported by Russian Academy of Sciences 
Presidium Program for Fundameutal Research "Fundamen¬ 
tal Problems of System Programming” in 2009-2014. 
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